-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up RemoteFX encoding #571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!! Do you have some numbers from criterion? How much the performance was improved?
suggestion: Put the benchmarks in a separate crate ironrdp-benchmarks
or benchmarks
.
fwiw, I opened awxkee/yuvutils-rs#3 |
Thank you for following up on this. I'll watch the issue. |
Well, criterion is huge time saver depending on your HW, but it still uses same amount of compute:
what's the rationale to put tests and benchmark in various crates? We try to ensure a common framework for the various crates that way? |
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
In theory, this could help the compiler to unroll loops.. doesn't seem to be the case though, but it allows to drop the assert_eq!() at least. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
That seems to speed up a bit the code: rfxenc time: [46.040 µs 46.288 µs 46.698 µs] change: [-9.2580% -8.6663% -7.8304%] (p = 0.00 < 0.05) Performance has improved. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
That doesn't change the speed though, code isn't inlined afaict. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
This can help a lot wall-clock time, but depends on CPU. rfx_enc time: [9.7885 ms 10.123 ms 10.439 ms] change: [-80.484% -79.847% -79.208%] (p = 0.00 < 0.05) Performance has improved. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Apparently it already did, I do not observe perf improvements. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Unfortunately, that doesn't seem to help unrolling & vectorizing: no perf improvements. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
rgb2yuv time: [11.706 µs 11.716 µs 11.727 µs] change: [-24.083% -23.682% -23.394%] (p = 0.00 < 0.05) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Good numbers! By "criterion" here, you mean "rayon" for parallelizing the encoding?
For the tests, the rationale is explained at several places:
EDIT: Also a recommend read with similar points made: https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html I'll also add less objective opinions and personal tastes:
For the benchmarks, in addition to the above arguments:
We also have a single That's about it for the rationale behind this suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Really nice work on the performance.
Unfortunately, it's not simple to use hand-written assembly from a project like yuvutils, because RDP uses odd bias and precision. Something left for the another day.